Skip to content

make dial timeout per-retry#3705

Open
mwhooker wants to merge 5 commits intoredis:masterfrom
mwhooker:dial-fix
Open

make dial timeout per-retry#3705
mwhooker wants to merge 5 commits intoredis:masterfrom
mwhooker:dial-fix

Conversation

@mwhooker
Copy link
Contributor

@mwhooker mwhooker commented Feb 6, 2026

We're noticing an issue where we only ever see
redis: connection pool: failed to dial after 1 attempts: dial tcp x.x.x.x:6379: i/o timeout.

Looking at existing code, it looks like it will never get to a dial retry, because it sets the context timeout to dialtimeout for the entire retry loop. This means that a single dial is the same timeout as all of the dials.

This patch changes behavior so the dial timeout applies dial attempt, rather than all attempts together.

Use dial timeout for each dial attempt, rather than
all attempts together.
@jit-ci
Copy link

jit-ci bot commented Feb 6, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@ndyakov
Copy link
Member

ndyakov commented Feb 6, 2026

Hello @mwhooker, thank you for opening this one. I was taking a look at this some time ago and was wondering what would be a users expectation. If there is a dial timeout, should it include the total amount of time for a connection to be established (including retries) or should it be just for a single dial. Based on your PR you would expect this to be per single connection, but does it make sense to have one that is for the whole period?

@mwhooker
Copy link
Contributor Author

mwhooker commented Feb 6, 2026 via email

@maulin-vasavada
Copy link

I think this change makes sense since this simplifies library's user expectation. dialTimeout config always means per dial/connection call. So math of total timeout based on retryCount is easier to do on the user side before choosing values. Currently, prior to this change, I feel it makes it complicated for users to choose because I can imagine when tweaking values for more retries (during prod incident resolution etc) it may be easily forgotten that all retries have to cumulatively meet single dialTimeout config value. To me that is counter-intuitive. Also, this may bring Redis go lib to parity with many other client libs like redis-py and provide consistency for users.

@ndyakov
Copy link
Member

ndyakov commented Feb 10, 2026

@maulin-vasavada we do have a slightly different way of opening connection and retrying compared to redis-py, but i do agree that this can be confusing. will have to run this with our e2e tests around end of week to verify it works as expected in different scenarious.

@ndyakov
Copy link
Member

ndyakov commented Feb 17, 2026

Sorry for the delay, the release we just did took longer than expected. Anyway, this week I will have some cycles to look at this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants